Table of content:¶

  1. Global Statistics

    Make some general statistics like Coronavirus Confirmed Cases Around The Globe, Coronavirus Cases

  2. Visualize Data by Country

    Plotting 5 charts is referred to as: Coronavirus Confirmed Cases, Daily New Coronavirus Confirmed Cases, Coronavirus Deaths, Daily New Coronavirus Deaths, Active Coronavirus Cases for each country

    2.1. USA (The Leader)
    2.2. China (The Origin)
    2.3. UK (The Mutant)
    2.4. Italy (The Early Chaos)
    2.5. India (The Midway Chaos)
    2.6. Australia (The Latest Chaos)
    2.7. France (My Country of Residence)

  3. Visualize Data by Continent

    Plotting 5 charts is referred to as: Coronavirus Confirmed Cases for each country

    3.1. Asia
    3.2. Europe
    3.3. Africa
    3.4. North America
    3.5. South America
    3.6. Australia/Oceania

  4. Most Affected Countries

    Finding the most affected countries by COVID

  5. Current and History of Distribution of Active Cases

    Plotting the chart to show the distribution of COVID around the world

Import the necessary libraries

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go

import os
import math

Parse the datetime format

In [2]:
from datetime import datetime, timedelta
dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d')

Reading data from CSV file and print the data frame

In [3]:
df = pd.read_csv('data/worldometer_coronavirus_daily_data.csv',
                 parse_dates=['date'], date_parser=dateparse)

df
Out[3]:
date country cumulative_total_cases daily_new_cases active_cases cumulative_total_deaths daily_new_deaths
0 2020-02-15 Afghanistan 0.0 NaN 0.0 0.0 NaN
1 2020-02-16 Afghanistan 0.0 NaN 0.0 0.0 NaN
2 2020-02-17 Afghanistan 0.0 NaN 0.0 0.0 NaN
3 2020-02-18 Afghanistan 0.0 NaN 0.0 0.0 NaN
4 2020-02-19 Afghanistan 0.0 NaN 0.0 0.0 NaN
... ... ... ... ... ... ... ...
184782 2022-05-10 Zimbabwe 248642.0 106.0 963.0 5481.0 2.0
184783 2022-05-11 Zimbabwe 248778.0 136.0 1039.0 5481.0 0.0
184784 2022-05-12 Zimbabwe 248943.0 165.0 1158.0 5481.0 0.0
184785 2022-05-13 Zimbabwe 249131.0 188.0 1283.0 5482.0 1.0
184786 2022-05-14 Zimbabwe 249206.0 75.0 1307.0 5482.0 0.0

184787 rows × 7 columns

Reading data from CSV file and print the data frame

In [4]:
df_summary = pd.read_csv('data/worldometer_coronavirus_summary_data.csv')

df_summary
Out[4]:
country continent total_confirmed total_deaths total_recovered active_cases serious_or_critical total_cases_per_1m_population total_deaths_per_1m_population total_tests total_tests_per_1m_population population
0 Afghanistan Asia 179267 7690.0 162202.0 9375.0 1124.0 4420 190.0 951337.0 23455.0 40560636
1 Albania Europe 275574 3497.0 271826.0 251.0 2.0 95954 1218.0 1817530.0 632857.0 2871945
2 Algeria Africa 265816 6875.0 178371.0 80570.0 6.0 5865 152.0 230861.0 5093.0 45325517
3 Andorra Europe 42156 153.0 41021.0 982.0 14.0 543983 1974.0 249838.0 3223924.0 77495
4 Angola Africa 99194 1900.0 97149.0 145.0 NaN 2853 55.0 1499795.0 43136.0 34769277
... ... ... ... ... ... ... ... ... ... ... ... ...
221 Wallis And Futuna Islands Australia/Oceania 454 7.0 438.0 9.0 NaN 41755 644.0 20508.0 1886140.0 10873
222 Western Sahara Africa 10 1.0 9.0 0.0 NaN 16 2.0 NaN NaN 624681
223 Yemen Asia 11819 2149.0 9009.0 661.0 23.0 381 69.0 265253.0 8543.0 31049015
224 Zambia Africa 320591 3983.0 315997.0 611.0 NaN 16575 206.0 3452554.0 178497.0 19342381
225 Zimbabwe Africa 249206 5482.0 242417.0 1307.0 12.0 16324 359.0 2287793.0 149863.0 15265849

226 rows × 12 columns

The code adds a new column 'continent' to the DataFrame 'df', where the value for each row is fetched from the 'continent' column of another DataFrame 'df_summary' based on a matching 'country' value. The updated 'df' DataFrame is then displayed.

In [5]:
df['continent'] = df.apply(lambda row: df_summary[df_summary.country == row.country].iloc[0].continent, axis=1)

df
Out[5]:
date country cumulative_total_cases daily_new_cases active_cases cumulative_total_deaths daily_new_deaths continent
0 2020-02-15 Afghanistan 0.0 NaN 0.0 0.0 NaN Asia
1 2020-02-16 Afghanistan 0.0 NaN 0.0 0.0 NaN Asia
2 2020-02-17 Afghanistan 0.0 NaN 0.0 0.0 NaN Asia
3 2020-02-18 Afghanistan 0.0 NaN 0.0 0.0 NaN Asia
4 2020-02-19 Afghanistan 0.0 NaN 0.0 0.0 NaN Asia
... ... ... ... ... ... ... ... ...
184782 2022-05-10 Zimbabwe 248642.0 106.0 963.0 5481.0 2.0 Africa
184783 2022-05-11 Zimbabwe 248778.0 136.0 1039.0 5481.0 0.0 Africa
184784 2022-05-12 Zimbabwe 248943.0 165.0 1158.0 5481.0 0.0 Africa
184785 2022-05-13 Zimbabwe 249131.0 188.0 1283.0 5482.0 1.0 Africa
184786 2022-05-14 Zimbabwe 249206.0 75.0 1307.0 5482.0 0.0 Africa

184787 rows × 8 columns

1. Global Statistics ¶

The code generates a pie chart using Plotly's graph objects (go). It sets the labels and values for different categories, configures various visual and interactive properties of the chart, and displays the chart using fig.show().

In [6]:
trace = go.Pie(labels=['Total Recovered', 'Total Active', 'Total Deaths'],
               values=[df_summary.total_recovered.sum(), df_summary.active_cases.sum(), df_summary.total_deaths.sum()],
               title="<b>Coronavirus Cases</b>",
               title_font_size=18,
               hovertemplate="<b>%{label}</b><br>%{value}<br><i>%{percent}</i>",
               #hoverinfo='percent+value+label',
               textinfo='percent',
               textposition='inside',
               hole=0.6,
               showlegend=True,
               marker=dict(colors=["#8dd3c7", "ffffb3", "#fb8072"],
                           line=dict(color='#000000',
                                     width=2),
                          ),
               name=""
              )
fig=go.Figure(data=[trace])
fig.show()

The code defines a function add_commas that takes a number as input and adds commas as thousand separators to it. It then prints statements with formatted numbers and country references related to total deaths, active cases, and total recoveries based on data from the df_summary DataFrame. The function add_commas is used to format the numbers with commas.

In [7]:
def add_commas(num):
    out = ""
    counter = 0
    for n in num[::-1]:
        counter += 1
        if counter == 4:
            counter = 1
            out = "," + out
        out = n + out
    return out

print(f"As of {df.date.max().strftime('%Y-%m-%d')}, here are the numbers:\n")

print(add_commas(str(int(df_summary.total_deaths.sum()))), "total deaths. That is more than the entire population of ", end="")
deaths_ref = df_summary[df_summary.population < df_summary.total_deaths.sum()].sort_values("population", ascending=False).iloc[:2]
print(deaths_ref.iloc[0].country, f"({add_commas(str(int(deaths_ref.iloc[0].population)))}) or",
      deaths_ref.iloc[1].country, f"({add_commas(str(int(deaths_ref.iloc[1].population)))})!")

print(add_commas(str(int(df_summary.active_cases.sum()))), "active cases. You can think of that as if the entire population of ", end="")
active_ref = df_summary[df_summary.population < df_summary.active_cases.sum()].sort_values("population", ascending=False).iloc[:2]
print(active_ref.iloc[0].country, f"({add_commas(str(int(active_ref.iloc[0].population)))}), or",
      active_ref.iloc[1].country, f"({add_commas(str(int(active_ref.iloc[1].population)))}), were sick right now!")

print(add_commas(str(int(df_summary.total_recovered.sum()))), "total recoveries. It's as if the entire population of ", end="")
recover_ref = df_summary[df_summary.population < df_summary.total_recovered.sum()].sort_values("population", ascending=False).iloc[:2]
print(recover_ref.iloc[0].country, f"({add_commas(str(int(recover_ref.iloc[0].population)))}), or",
      recover_ref.iloc[1].country, f"({add_commas(str(int(recover_ref.iloc[1].population)))}), went through and recovered from Covid-19!")
As of 2022-05-14, here are the numbers:

6,288,083 total deaths. That is more than the entire population of Singapore (5,936,034) or Denmark (5,830,190)!
13,996,500 active cases. You can think of that as if the entire population of Guinea (13,795,931), or Rwanda (13,549,956), were sick right now!
460,397,633 total recoveries. It's as if the entire population of USA (334,617,623), or Indonesia (278,910,317), went through and recovered from Covid-19!

The code computes the logarithm (base 2) of the 'total_confirmed' column in the 'df_summary' DataFrame and assigns it to a new column 'log(Total Confirmed)'. It also applies the 'add_commas' function to format the 'total_confirmed' values with commas and assigns the formatted values to a new column 'Total Confirmed'. Then, it creates a choropleth map using Plotly Express (px) based on the 'df_summary' DataFrame, where the color represents the logarithm of the total confirmed cases. The map is customized with a color scale, hover information, and a color bar with tick labels. Finally, the map is displayed using fig.show().

In [8]:
df_summary['log(Total Confirmed)'] = np.log2(df_summary['total_confirmed'])
df_summary['Total Confirmed'] = df_summary['total_confirmed'].apply(lambda x: add_commas(str(x)))

fig = px.choropleth(df_summary,
                    locations="country",
                    color="log(Total Confirmed)",
                    locationmode = 'country names',
                    hover_name='country',
                    hover_data=['Total Confirmed'],
                    color_continuous_scale='reds',
                    title = '<b>Coronavirus Confirmed Cases Around The Globe</b>')


log_scale_vals = list(range(0,25,2))
scale_vals = (np.exp2(log_scale_vals)).astype(int).astype(str)

scale_vals = list(map(add_commas, scale_vals))

fig.update_layout(title_font_size=22,
                  margin={"r":20, "l":30},
                  coloraxis={#"showscale":False,
                            "colorbar":dict(title="<b>Confirmed Cases</b><br>",
                                            #range=[np.log(50), np.log(6400)],
                                            titleside="top",
                                            tickmode="array",
                                            tickvals=log_scale_vals,
                                            ticktext=scale_vals
                                        )},
                 )

fig.show()

The code generates a treemap using Plotly Express (px) based on the 'df_summary' DataFrame, representing the breakdown of total confirmed cases by country. The treemap is customized with a specified path, values, height, title, and color sequence. The update_traces method is used to set the text information displayed on the treemap, and the resulting visualization is displayed using fig.show().

In [9]:
fig = px.treemap(df_summary, path=["country"], values="total_confirmed", height = 750,
                 title="<b>Total Coronavirus Confirmed Cases Breakdown by Country</b>",
                 color_discrete_sequence = px.colors.qualitative.Set3)

fig.update_traces(textinfo = "label+text+value")
fig.show()

2. Visualize Data by Country ¶

The code defines a function plot_stats that takes a country as input. It generates multiple plots using Plotly and pandas based on the data for the specified country. The plots include cumulative total confirmed cases, daily new cases, cumulative total deaths, daily new deaths, and active cases. Each plot has a customized layout, title, and visual style. The resulting plots are displayed using fig.show().

In [10]:
def plot_stats(country):
    if country in ["USA", "UK"]:
        country_prefix = "the "
    else:
        country_prefix = ""
    df_country = df[df.country == country]
    df_country.set_index('date', inplace=True)

    # Plot 1
    if not all(df_country.cumulative_total_cases.isna()):
        layout = go.Layout(
            yaxis={'range':[0, df_country.cumulative_total_cases[-1] * 1.05],
                  'title':'Coronavirus Confirmed Cases'},
            xaxis={'title':''},
            )

        fig = px.area(df_country, x=df_country.index, y="cumulative_total_cases",
                      title=f"<b>Cumulative Total Confirmed Cases in {country_prefix}{country}<br>from {df_country.index[0].strftime('%Y-%m-%d')} till {df_country.index[-1].strftime('%Y-%m-%d')}</b>",
                      template='plotly_dark')

        fig.update_traces(line={'width':5})

        fig.update_layout(layout)
        fig.show()

    # Plot 2
    if not all(df_country.daily_new_cases.isna()):
        layout = go.Layout(
            yaxis={'range':[0, df_country.daily_new_cases.max() * 1.05],
                  'title':'Daily New Coronavirus Confirmed Cases'},
            xaxis={'title':''},
            template='plotly_dark',
            title=f"<b>Daily New Cases in {country_prefix}{country}<br>from {df_country.index[0].strftime('%Y-%m-%d')} till {df_country.index[-1].strftime('%Y-%m-%d')}</b>",
            )

        MA7 = df_country.daily_new_cases.rolling(7).mean().dropna().astype(int)

        fig = go.Figure()
        fig.add_trace(go.Bar(name="Daily Cases", x=df_country.index, y=df_country.daily_new_cases))
        fig.add_trace(go.Scatter(name="7-Day Moving Average", x=df_country.index[df_country.shape[0] - MA7.shape[0]:], y=MA7, line=dict(width=3)))

        fig.update_layout(layout)
        fig.show()

    # Plot 3
    if not all(df_country.cumulative_total_deaths.isna()):
        layout = go.Layout(
            yaxis={'range':[0, df_country.cumulative_total_deaths[-1] * 1.05],
                  'title':'Coronavirus Deaths'},
            xaxis={'title':''},
            )

        fig = px.area(df_country, x=df_country.index, y="cumulative_total_deaths",
                      title=f"<b>Cumulative Total Deaths in {country_prefix}{country}<br>from {df_country.index[0].strftime('%Y-%m-%d')} till {df_country.index[-1].strftime('%Y-%m-%d')}</b>",
                      template='plotly_dark')

        fig.update_traces(line={'color':'red', 'width':5})

        fig.update_layout(layout)
        fig.show()

    # Plot 4
    if not all(df_country.daily_new_deaths.isna()):
        layout = go.Layout(
            yaxis={'range':[0, df_country.daily_new_deaths.max() * 1.05],
                  'title':'Daily New Coronavirus Deaths'},
            xaxis={'title':''},
            template='plotly_dark',
            title=f"<b>Daily Deaths in {country_prefix}{country}<br>from {df_country.index[0].strftime('%Y-%m-%d')} till {df_country.index[-1].strftime('%Y-%m-%d')}</b>",
            )

        MA7 = df_country.daily_new_deaths.rolling(7).mean().dropna().astype(int)

        fig = go.Figure()
        fig.add_trace(go.Bar(name="Daily Deaths", x=df_country.index, y=df_country.daily_new_deaths, marker_color='red'))
        fig.add_trace(go.Scatter(name="7-Day Moving Average", x=df_country.index[df_country.shape[0] - MA7.shape[0]:], y=MA7, line={'width':3, 'color':'white'}))

        fig.update_layout(layout)
        fig.show()

    # Plot 5
    if not all(df_country.active_cases.isna()):
        layout = go.Layout(
            yaxis={'range':[0, df_country.active_cases.max() * 1.05],
                  'title':'Active Coronavirus Cases'},
            xaxis={'title':''},
            )

        fig = px.line(df_country, x=df_country.index, y="active_cases",
                      title=f"<b>Active Cases in {country_prefix}{country}<br>from {df_country.index[0].strftime('%Y-%m-%d')} till {df_country.index[-1].strftime('%Y-%m-%d')}</b>",
                      template='plotly_dark')

        fig.update_traces(line={'color':'yellow', 'width':5})

        fig.update_layout(layout)
        fig.show()

2.1. USA (The Leader) ¶

The code calls the function plot_stats with the argument 'USA', which generates and displays multiple plots showing COVID-19 statistics for the United States, including cumulative total confirmed cases, daily new cases, cumulative total deaths, daily new deaths, and active cases.

In [11]:
plot_stats('USA')

2.2. China (The Origin) ¶

The code calls the function plot_stats with the argument 'China', which generates and displays multiple plots showing COVID-19 statistics for China, including cumulative total confirmed cases, daily new cases, cumulative total deaths, daily new deaths, and active cases.

In [12]:
plot_stats('China')

2.3. UK (The Mutant) ¶

The code calls the function plot_stats with the argument 'UK', which generates and displays multiple plots showing COVID-19 statistics for the United Kingdom, including cumulative total confirmed cases, daily new cases, cumulative total deaths, daily new deaths, and active cases.

In [13]:
plot_stats('UK')

2.4. Italy (The Early Chaos) ¶

The code calls the function plot_stats with the argument 'Italy', which generates and displays multiple plots showing COVID-19 statistics for Italy, including cumulative total confirmed cases, daily new cases, cumulative total deaths, daily new deaths, and active cases.

In [14]:
plot_stats('Italy')

2.5. India (The Midway Chaos) ¶

The code calls the function plot_stats with the argument 'India', which generates and displays multiple plots showing COVID-19 statistics for India, including cumulative total confirmed cases, daily new cases, cumulative total deaths, daily new deaths, and active cases.

In [15]:
plot_stats("India")

2.6. Australia (The Latest Chaos) ¶

The code calls the function plot_stats with the argument 'Australia', which generates and displays multiple plots showing COVID-19 statistics for Australia, including cumulative total confirmed cases, daily new cases, cumulative total deaths, daily new deaths, and active cases.

In [16]:
plot_stats("Australia")

2.7. France ¶

The code calls the function plot_stats with the argument 'France', which generates and displays multiple plots showing COVID-19 statistics for France, including cumulative total confirmed cases, daily new cases, cumulative total deaths, daily new deaths, and active cases.

In [17]:
plot_stats("France")

3. Visualize Data by Continent ¶

The code defines a function plot_continent that takes a continent as input. It filters the data from the DataFrame df based on the specified continent and generates a line plot using Plotly Express (px). The plot shows the cumulative total confirmed cases over time for each country in the selected continent. The plot is customized with annotations, marker sizes, and various visual properties. The resulting plot is displayed using fig.show().

In [18]:
def plot_continent(continent):
    df_continent = df[df.continent == continent]
    fig = px.line(df_continent, x="date", y="cumulative_total_cases", color="country", #log_y=True,
                  line_group="country", hover_name="country", template="plotly_dark")

    annotations = []
    # Adding labels
    ys = []
    for tr in fig.select_traces():
        ys.append(tr.y[-1])
    y_scale = 0.155 / max(ys)
    for tr in fig.select_traces():
        # labeling the right_side of the plot
        size = max(1, int(math.log(tr.y[-1], 1.1) * tr.y[-1] * y_scale))
        annotations.append(dict(x=tr.x[-1] + timedelta(hours=int((2 + size/5) * 24)), y=tr.y[-1],
                                xanchor='left', yanchor='middle',
                                text=tr.name,
                                font=dict(family='Arial',
                                          size=7+int(size/2)
                                         ),
                                showarrow=False))
        fig.add_trace(go.Scatter(
            x=[tr.x[-1]],
            y=[tr.y[-1]],
            mode='markers',
            name=tr.name,
            marker=dict(color=tr.line.color, size=size)
        ))
    fig.update_traces(line={'width':1})
    fig.update_layout(annotations=annotations, showlegend=False, uniformtext_mode='hide',
                      title=f"<b>Cumulative Total Coronavirus Cases in {continent}<br>between {df_continent.date.min().strftime('%Y-%m-%d')} and {df_continent.date.max().strftime('%Y-%m-%d')}</b>",
                      yaxis={'title':'Coronavirus Confirmed Cases'},
                      xaxis={'title':''}
                     )
    fig.show()

3.1. Asia ¶

The code calls the function plot_continent with the argument 'Asia', which generates and displays a line plot showing the cumulative total confirmed cases over time for each country in Asia. The plot is customized with annotations, marker sizes, and visual properties to highlight the data.

In [19]:
plot_continent("Asia")

3.2. Europe ¶

The code calls the function plot_continent with the argument 'Europe', which generates and displays a line plot showing the cumulative total confirmed cases over time for each country in Europe. The plot is customized with annotations, marker sizes, and visual properties to highlight the data.

In [20]:
plot_continent("Europe")

3.3. Africa ¶

The code calls the function plot_continent with the argument 'Africa', which generates and displays a line plot showing the cumulative total confirmed cases over time for each country in Africa. The plot is customized with annotations, marker sizes, and visual properties to highlight the data.

In [21]:
plot_continent("Africa")

3.4. North America ¶

The code calls the function plot_continent with the argument 'North America', which generates and displays a line plot showing the cumulative total confirmed cases over time for each country in North America

In [22]:
plot_continent("North America")

3.5. South America ¶

The code calls the function plot_continent with the argument 'South America', which generates and displays a line plot showing the cumulative total confirmed cases over time for each country in South America

In [23]:
plot_continent("South America")

3.6. Australia/Oceania ¶

The code calls the function plot_continent with the argument 'Australia/Oceania', which generates and displays a line plot showing the cumulative total confirmed cases over time for each country in Australia/Oceania

In [24]:
plot_continent("Australia/Oceania")

4. Most Affected Countries ¶

The code sorts the df_summary DataFrame by total cases per 1 million population and calculates the percentage of the population with confirmed cases. It assigns colors based on whether the percentage is above or below the mean, and creates a scatter plot using Plotly Express (px). The plot shows the percentage of population with confirmed cases for each country, with marker size and color indicating the percentage. The plot is further customized with annotations, a line representing the mean, and additional visual elements. The resulting plot is displayed using fig.show().

In [25]:
sorted_by_cases_per_1m = df_summary.sort_values(['total_cases_per_1m_population'])
sorted_by_cases_per_1m['% of Population with Confirmed Cases'] = sorted_by_cases_per_1m['total_cases_per_1m_population']/1_000_000
mean = sorted_by_cases_per_1m['% of Population with Confirmed Cases'].mean()
sorted_by_cases_per_1m['color'] = sorted_by_cases_per_1m.apply(lambda row: "Red" if row['% of Population with Confirmed Cases'] > mean else "Blue", axis=1)
fig = px.scatter(sorted_by_cases_per_1m, x='country', y='% of Population with Confirmed Cases',
                 size='% of Population with Confirmed Cases',
                 color='color',
                 title=f"<b>Coronavirus Infection-Rate by Country as of {df.date.max().strftime('%Y-%m-%d')}</b>",
                 height=650)
fig.update_traces(marker_line_color='rgb(75,75,75)',
                  marker_line_width=1.5, opacity=0.8,
                  hovertemplate="<b>%{x}</b><br>%{y} of Population with Confirmed Cases<extra></extra>",)
fig.update_layout(showlegend=False,
                 yaxis={"tickformat":".3%", "range":[0,sorted_by_cases_per_1m['% of Population with Confirmed Cases'].max() * 1.1]},
                 xaxis={"title": ""},
                 title_font_size=20)


to_mention = ["China", "Australia", "India", "South Africa", "Russia", "Italy","Brazil", "UK", "France", "USA",  "Montenegro"]

for i, country in enumerate(to_mention):
    ay = 30 if i%2 else -30
    ax = 20
    if country == "USA": ay, ax = -30, -20
    if country == "UK": ax = -20
    if country == "France": ay, ax = -60, -40
    if country == "Russia": ax = -20
    if country == "Australia": ay = -30
    if country == "Brazil": ax = -20
    fig.add_annotation(
            x=country,
            y=sorted_by_cases_per_1m['% of Population with Confirmed Cases'][sorted_by_cases_per_1m.index[sorted_by_cases_per_1m.country==country][0]],
            xref="x",
            yref="y",
            text=country,
            showarrow=True,
            font=dict(
                family="Courier New, monospace",
                size=14,
                color="#ffffff"
                ),
            align="center",
            arrowhead=2,
            arrowsize=1,
            arrowwidth=2,
            arrowcolor="#636363",
            ax=ax,
            ay=ay,
            bordercolor="#c7c7c7",
            borderwidth=2,
            borderpad=4,
            bgcolor=sorted_by_cases_per_1m['color'][sorted_by_cases_per_1m.index[sorted_by_cases_per_1m.country==country][0]],
            opacity=0.6
            )

fig.add_shape(type='line',
              x0=sorted_by_cases_per_1m['country'].iloc[0], y0=mean,
              x1=sorted_by_cases_per_1m['country'].iloc[-1], y1=mean,
              line=dict(color='Green',width=1),
              xref='x', yref='y'
             )
fig.add_annotation(x=sorted_by_cases_per_1m['country'].iloc[0], y=mean,
                   text=f"mean = {mean*100:.2f}%",
                   showarrow=False,
                   xanchor="left",
                   yanchor="bottom",
                   font={"color":"Green", "size":14}
                  )
fig.show()

The code sorts the df_summary DataFrame by total deaths per 1 million population, filters out any rows with missing values, and calculates the percentage of the population with coronavirus death cases. It assigns colors based on whether the percentage is above or below the mean, and creates a scatter plot using Plotly Express (px). The plot shows the percentage of the population with death cases for each country, with marker size and color indicating the percentage. The plot is further customized with annotations, a line representing the mean, and additional visual elements. The resulting plot is displayed using fig.show().

In [26]:
sorted_by_deaths_per_1m = df_summary.sort_values(['total_deaths_per_1m_population'])
sorted_by_deaths_per_1m = sorted_by_deaths_per_1m[sorted_by_deaths_per_1m['total_deaths_per_1m_population'].notna()]
sorted_by_deaths_per_1m['% of Population with Coronavirus Death Cases'] = sorted_by_deaths_per_1m['total_deaths_per_1m_population']/1_000_000
mean = sorted_by_deaths_per_1m['% of Population with Coronavirus Death Cases'].mean()
sorted_by_deaths_per_1m['color'] = sorted_by_deaths_per_1m.apply(lambda row: "Red" if row['% of Population with Coronavirus Death Cases'] > mean else "Blue", axis=1)
#sorted_by_deaths_per_1m.dropna(inplace=True)
fig = px.scatter(sorted_by_deaths_per_1m, x='country', y='% of Population with Coronavirus Death Cases',
                 size='% of Population with Coronavirus Death Cases',
                 color='color',
                 title=f"<b>Coronavirus Death-Rate by Country as of {df.date.max().strftime('%Y-%m-%d')}</b>",
                 height=650)

fig.update_traces(marker_line_color='rgb(75,75,75)',
                  marker_line_width=1.5, opacity=0.8,
                  hovertemplate="<b>%{x}</b><br>%{y} of Population with Death Cases<extra></extra>",)
fig.update_layout(showlegend=False,
                 yaxis={"tickformat":".3%", "range":[0,sorted_by_deaths_per_1m['% of Population with Coronavirus Death Cases'].max() * 1.1]},
                 xaxis={"title": ""},
                 title_font_size=20)


to_mention = ["China", "Australia", "India", "South Africa", "Russia", "Italy","Brazil", "UK", "France", "USA",  "Bulgaria", "Peru"]

for i, country in enumerate(to_mention):
    print
    ay = 30 if i%2 else -30
    ax = 20
    if country == "Russia": ax = -20
    if country == "Czech Republic": ay, ax = -30, -60
    if country == "USA": ay = 50
    if country == "Italy": ay, ax = 30, -20
    if country == "UK": ay, ax = -30, 40
    if country == "Australia": ay = -30
    if country == "France": ay, ax = -60, -40
    if country == "Brazil": ax = -20
    if country == "Peru": ay = -30
    fig.add_annotation(
            x=country,
            y=sorted_by_deaths_per_1m['% of Population with Coronavirus Death Cases'][sorted_by_deaths_per_1m.index[sorted_by_deaths_per_1m.country==country][0]],
            xref="x",
            yref="y",
            text=country,
            showarrow=True,
            font=dict(
                family="Courier New, monospace",
                size=14,
                color="#ffffff"
                ),
            align="center",
            arrowhead=2,
            arrowsize=1,
            arrowwidth=2,
            arrowcolor="#636363",
            ax=ax,
            ay=ay,
            bordercolor="#c7c7c7",
            borderwidth=2,
            borderpad=4,
            bgcolor=sorted_by_deaths_per_1m['color'][sorted_by_deaths_per_1m.index[sorted_by_deaths_per_1m.country==country][0]],
            opacity=0.6
            )

fig.add_shape(type='line',
              x0=sorted_by_deaths_per_1m['country'].iloc[0], y0=mean,
              x1=sorted_by_deaths_per_1m['country'].iloc[-1], y1=mean,
              line=dict(color='Green',width=1),
              xref='x', yref='y'
             )
fig.add_annotation(x=sorted_by_deaths_per_1m['country'].iloc[0], y=mean,
                   text=f"mean = {mean*100:.2f}%",
                   showarrow=False,
                   xanchor="left",
                   yanchor="bottom",
                   font={"color":"Green", "size":14}
                  )

fig.show()

The code calculates the severity of the coronavirus by computing the ratio of total deaths to total confirmed cases for each country in the df_summary DataFrame. It sorts the DataFrame based on the severity ratio, filters out any rows with missing values, and assigns colors based on whether the severity ratio is above or below the mean. It then creates a scatter plot using Plotly Express (px) to visualize the severity ratio for each country, with marker size and color indicating the ratio. The plot is further customized with annotations, a line representing the mean, and additional visual elements. The resulting plot is displayed using fig.show().

In [27]:
df_summary["Coronavirus Deaths/Confirmed Cases"] = df_summary["total_deaths"] / df_summary["total_confirmed"]
sorted_by_deaths_per_confirmed = df_summary.sort_values(['Coronavirus Deaths/Confirmed Cases'])
sorted_by_deaths_per_confirmed = sorted_by_deaths_per_confirmed[sorted_by_deaths_per_confirmed['Coronavirus Deaths/Confirmed Cases'].notna()]
mean = sorted_by_deaths_per_confirmed['Coronavirus Deaths/Confirmed Cases'].mean()
sorted_by_deaths_per_confirmed['color'] = sorted_by_deaths_per_confirmed.apply(lambda row: "Red" if row['Coronavirus Deaths/Confirmed Cases'] > mean else "Blue", axis=1)
fig = px.scatter(sorted_by_deaths_per_confirmed, x='country', y='Coronavirus Deaths/Confirmed Cases',
                 size='Coronavirus Deaths/Confirmed Cases',
                 color='color',
                 title=f"<b>Coronavirus severity by Country as of {df.date.max().strftime('%Y-%m-%d')}</b>",
                 height=650)

fig.update_traces(marker_line_color='rgb(75,75,75)',
                  marker_line_width=1.5, opacity=0.8,
                  hovertemplate="<b>%{x}</b><br>%{y} of Cases Leading to Death Cases<extra></extra>",)
fig.update_layout(showlegend=False,
                 yaxis={"tickformat":".3%", "range":[0,sorted_by_deaths_per_confirmed['Coronavirus Deaths/Confirmed Cases'].max() * 1.1]},
                 xaxis={"title": ""},
                 title_font_size=20)


to_mention = ["China", "Australia", "India", "South Africa", "Russia", "Italy","Brazil", "UK", "France", "USA", "Yemen", "Vanuatu"]

for i, country in enumerate(to_mention):
    print
    ay = 30 if i%2 else -30
    ax = 20
    if country in ["India", "USA", "Russia"]: ax = -20
    if country == "Yemen": ay = 30
    if country == "UK": ay, ax = -60, -40
    if country == "Belgium": ay, ax = -30, -60
    if country == "USA": ay, ax = -30, 40
    if country == "Italy": ax = -40
    if country == "Australia": ay = -30
    if country == "France": ay, ax = -60, 40
    if country == "Brazil": ay, ax = -60, -20
    fig.add_annotation(
            x=country,
            y=sorted_by_deaths_per_confirmed['Coronavirus Deaths/Confirmed Cases'][sorted_by_deaths_per_confirmed.index[sorted_by_deaths_per_confirmed.country==country][0]],
            xref="x",
            yref="y",
            text=country,
            showarrow=True,
            font=dict(
                family="Courier New, monospace",
                size=14,
                color="#ffffff"
                ),
            align="center",
            arrowhead=2,
            arrowsize=1,
            arrowwidth=2,
            arrowcolor="#636363",
            ax=ax,
            ay=ay,
            bordercolor="#c7c7c7",
            borderwidth=2,
            borderpad=4,
            bgcolor=sorted_by_deaths_per_confirmed['color'][sorted_by_deaths_per_confirmed.index[sorted_by_deaths_per_confirmed.country==country][0]],
            opacity=0.6
            )

fig.add_shape(type='line',
              x0=sorted_by_deaths_per_confirmed['country'].iloc[0], y0=mean,
              x1=sorted_by_deaths_per_confirmed['country'].iloc[-1], y1=mean,
              line=dict(color='Green',width=1),
              xref='x', yref='y'
             )
fig.add_annotation(x=sorted_by_deaths_per_confirmed['country'].iloc[0], y=mean,
                   text=f"mean = {mean*100:.2f}%",
                   showarrow=False,
                   xanchor="left",
                   yanchor="bottom",
                   font={"color":"Green", "size":14}
                  )

fig.show()

5. Current and History of Distribution of Active Cases ¶

The code prepares and visualizes the global active COVID-19 cases over time. It selects the relevant columns from the df DataFrame, filters out rows with zero or missing active cases, and calculates the logarithm base 2 of the active cases. It then creates a choropleth map animation using Plotly Express (px), where the color represents the logarithm of active cases. The map is animated over time, and additional visual properties such as the title and colorbar are customized. The resulting visualization is displayed using fig.show().

In [28]:
active_cases_df = df[['date', 'country', 'active_cases']].dropna().sort_values('date')
active_cases_df = active_cases_df[active_cases_df.active_cases > 0]
active_cases_df['log2(active_cases)'] = np.log2(active_cases_df['active_cases'])
active_cases_df['date'] = active_cases_df['date'].dt.strftime('%m/%d/%Y')

fig = px.choropleth(active_cases_df, locations="country", locationmode='country names',
                    color="log2(active_cases)", hover_name="country", hover_data=['active_cases'],
                    projection="natural earth", animation_frame="date",
                    title='<b>Coronavirus Global Active Cases Over Time</b>',
                    color_continuous_scale="reds",
                   )

fig.update_layout(coloraxis={"colorbar": {"title":"<b>Active Cases</b><br>",
                                          "titleside":"top",
                                          "tickmode":"array",
                                          "tickvals":log_scale_vals,
                                          "ticktext":scale_vals}
                            }
                 )

fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 10
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 2

fig.show()

The code generates an area plot using Plotly Express (px) to visualize the active COVID-19 cases over time for the top 20 countries with the highest active cases on the latest date in the df DataFrame. The plot is customized with line width, title, and axis labels. The resulting visualization is displayed using fig.show().

In [29]:
fig = px.area(df[df.country.isin(df[df.date == df.date.max()].sort_values("active_cases", ascending=False).iloc[:20].country)].sort_values("active_cases", ascending=False),
              x="date", y="active_cases", color="country", template="plotly_dark")#, groupnorm='percent')

fig.update_traces(line={"width":1.25})
fig.update_layout(title = f"Top 20 Countries with Most Active Cases on {df.date.max().strftime('%Y-%m-%d')}",
                  xaxis={"title": ""},
                  yaxis={"title":"Active Cases"})

The code generates a treemap using Plotly Express (px) to display the breakdown of active COVID-19 cases by country on the latest date in the df_summary DataFrame. The treemap's tiles represent countries, with their size indicating the number of active cases. The plot is customized with a title and text information displayed on the tiles. The resulting visualization is displayed using fig.show().

In [30]:
fig = px.treemap(df_summary, path=["country"], values="active_cases", height = 750,
                 title=f"<b>Active Cases Breakdown on {df.date.max().strftime('%Y-%m-%d')}</b>",
                 color_discrete_sequence = px.colors.qualitative.Set3)

fig.update_traces(textinfo = "label+text+value")
fig.show()